Fix a query performance issue with tables without a PK. by mildbyte · Pull Request #561 · splitgraph/sgr

mildbyte · 2021-11-04T12:28:48Z

When partitioning data, if the table doesn't have a primary key, we use a "surrogate"
primary key by concatenating the whole row as a string on the PG side (this is because
the whole row can sometimes contain NULLs which we can't compare in PG).

The query planner used to use the non-textual composite PK to figure out which
chunks overlapped in order to materialize them. This meant false positives where
something like (2, 'orange') would force a materialization because numerically,
it's greater than 10 and lexicographically, it isn't.

To fix this, regenerate the textual surrogate PK at query plan time and use it
to figure out when chunks overlap.

When partitioning data, if the table doesn't have a primary key, we use a "surrogate" primary key by concatenating the whole row as a string on the PG side (this is because the whole row can sometimes contain NULLs which we can't compare in PG). The query planner used to use the non-textual composite PK to figure out which chunks overlapped in order to materialize them. This meant false positives where something like (2, 'orange') would force a materialization because numerically, it's greater than 10 and lexicographically, it isn't. To fix this, regenerate the textual surrogate PK at query plan time and use it to figure out when chunks overlap.

* Splitfile speedups (#567) * Various query speedups (#563, #561) * More robust CSV querying (#562) Full set of changes: [`v0.2.17...v0.2.18`](v0.2.17...v0.2.18)

mildbyte merged commit c0e3fed into master Nov 4, 2021

mildbyte deleted the bugfix/no-pk-overlapping-chunks branch November 4, 2021 12:28

mildbyte added a commit that referenced this pull request Nov 17, 2021

Bump version: 0.2.17 → 0.2.18

cc98bd8

* Splitfile speedups (#567) * Various query speedups (#563, #561) * More robust CSV querying (#562) Full set of changes: [`v0.2.17...v0.2.18`](v0.2.17...v0.2.18)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix a query performance issue with tables without a PK.#561

Fix a query performance issue with tables without a PK.#561
mildbyte merged 1 commit intomasterfrom
bugfix/no-pk-overlapping-chunks

mildbyte commented Nov 4, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mildbyte commented Nov 4, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant